Converting SynTagRus Dependency Treebank into Penn Treebank Style
نویسندگان
چکیده
This paper presents the conversion of SynTagRus dependency structures into Penn Treebank style phrase structures, whose resulting data will be used to train a statistical constituency parser for Russian and create a large-scale constituency-parsed corpus. The implemented conversion includes various innovative features in order to create phrase structure trees that are closest to Penn Treebank style while optimally preserving information of the original dependency structure annotations. We believe the newly converted phrase structure treebank will be not only an adequate training dataset for our ongoing project but also a valuable resource for traditional and computational linguistic research.
منابع مشابه
M a T E M a T I C K O -f Y Z I K Á L N Í F a K U L T a Conversion of Syntagrus (the Russian Dependency Treebank) to Universal Dependencies
This report presents the Universal Dependency (UD) annotated corpus for Russian and a conversion process which was developed to transform SynTagRus, the Russian dependency treebank, into a UD-style annotated corpus. The aim of this work was to create a UD-style annotated corpus for Russian since no such corpus was available prior to UD release 1.3. The conversion rules were based on manually an...
متن کاملParsing the SynTagRus Treebank of Russian
We present the first results on parsing the SYNTAGRUS treebank of Russian with a data-driven dependency parser, achieving a labeled attachment score of over 82% and an unlabeled attachment score of 89%. A feature analysis shows that high parsing accuracy is crucially dependent on the use of both lexical and morphological features. We conjecture that the latter result can be generalized to richl...
متن کاملEfficient Third-Order Dependency Parsers
We present algorithms for higher-order dependency parsing that are “third-order” in the sense that they can evaluate substructures containing three dependencies, and “efficient” in the sense that they require only O(n4) time. Importantly, our new parsers can utilize both sibling-style and grandchild-style interactions. We evaluate our parsers on the Penn Treebank and Prague Dependency Treebank,...
متن کاملComparing linguistic information in treebank annotations
The paper investigates the issue of portability of methods and results over treebanks in different languages and annotation formats. In particular, it addresses the problem of converting an Italian treebank, the Turin University Treebank (TUT), developed in dependency format, into the Penn Treebank format, in order to possibly exploit the tools and methods already developed and compare the adeq...
متن کاملCCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank
This article presents an algorithm for translating the Penn Treebank into a corpus of Combinatory Categorial Grammar (CCG) derivations augmented with local and long-range word–word dependencies. The resulting corpus,CCGbank,includes 99.4% of the sentences in the Penn Treebank. It is available from the Linguistic Data Consortium,and has been used to train widecoverage statistical parsers that ob...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016